Published online 27 May 2011 



Nucleic Acids Research, 2011, Vol. 39, Web Server issue W155-W159 

doi:10.1093/nar/gkr319 



psRNATarget: a plant small RNA target 
analysis server 

Xinbin Dai and Patrick Xuechun Zhao* 

Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, 
OK 73401, USA 

Received February 21, 2011; Revised April 19, 2011; Accepted April 20, 2011 



ABSTRACT 

Plant endogenous non-coding short small RNAs 
(20-24 nt), including microRNAs (miRNAs) and a 
subset of small interfering RNAs (ta-siRNAs), play 
important role in gene expression regulatory 
networks (GRNs). For example, many transcription 
factors and development-related genes have been 
reported as targets of these regulatory small 
RNAs. Although a number of miRNA target predic- 
tion algorithms and programs have been developed, 
most of them were designed for animal miRNAs 
which are significantly different from plant miRNAs 
in the target recognition process. These differences 
demand the development of separate plant miRNA 
(and ta-siRNA) target analysis tool(s). We present 
psRNATarget, a plant small RNA target analysis 
server, which features two important analysis func- 
tions: (i) reverse complementary matching between 
small RNA and target transcript using a proven 
scoring schema, and (ii) target-site accessibility 
evaluation by calculating unpaired energy (UPE) 
required to 'open' secondary structure around 
small RNA's target site on mRNA. The psRNATarget 
incorporates recent discoveries in plant miRNA 
target recognition, e.g. it distinguishes translational 
and post-transcriptional inhibition, and it reports 
the number of small RNA/target site pairs that 
may affect small RNA binding activity to target tran- 
script. The psRNATarget server is designed for 
high-throughput analysis of next-generation data 
with an efficient distributed computing back-end 
pipeline that runs on a Linux cluster. The server 
front-end integrates three simplified user-friendly 
interfaces to accept user-submitted or preloaded 
small RNAs and transcript sequences; and outputs 
a comprehensive list of small RNA/target pairs 
along with the online tools for batch downloading, 



key word searching and results sorting. The 
psRNATarget server is freely available at http:// 
plantgrn.noble.org/psRNATarget/. 

INTRODUCTION 

Plant endogenous non-coding short small RNAs (20-24 
nt), including microRNAs (miRNAs) and a subset of 
small interfering RNAs (ta-siRNAs), are derived from 
the cleavage products of double-strand RNAs 
(ds-RNAs) by DICER-like enzymes (1^1). These regula- 
tory small RNAs (mainly include miRNAs and 
ta-siRNAs, sic passim) negatively regulate gene expression 
at post-transcriptional level by directing the cleavage of 
target transcript (mRNA) (5). Several transcription 
factors and development-related genes have been 
reported as targets of these regulatory small RNAs, and 
together they play key roles in gene expression regulatory 
network controlling plant growth and development (6). 
These discoveries have aroused wide interest and urgent 
demand for genome-wide analysis of small RNAs and 
dissect of their functions, for example, identifying their 
regulatory target genes in plants. 

A number of algorithms and programs have been 
introduced to search target genes for miRNAs (7-9). 
However, most of them were developed for animal 
miRNAs which are significantly different from plant 
miRNAs in the target recognition process (9,10). For 
example, an animal miRNA generally requires loose com- 
plementarity in about first eight nucleotides of the 
miRNA, while a plant miRNA demands the whole 
miRNA mature sequence to be near perfectly aligned 
with its mRNA target. Secondly, an animal miRNA 
tends to inhibit target gene's expression at the translation- 
al level, whereas a plant miRNA directly cleaves its target 
transcript. In addition, an animal miRNA inclines to rec- 
ognize 3'-UTR region of its target mRNA and a plant 
miRNA usually has no preference in terms of position. 
Recent discoveries suggest that plant miRNA may 
inhibit gene expression at the translational level (11), 
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though it seems to utilize a different recognition pattern 
compared with a typical animal miRNA's action (see the 
next section). These differences therefore demand the de- 
velopment of separate plant small RNA target analysis 
tools. 

A regular expression (a computer language that de- 
scribes matching strings in text) like pattern matching 
program, PatScan (12), was adopted in identifying the 
miRNA targets in rice and Arabidopsis (13,14). In order 
to utilize PatScan, users need first prepare regular 
expression-style pattern sets for an input miRNA 
sequence, then search target sequences with matching pat- 
tern^) in a candidate mRNA sequence dataset. However, 
the PatScan program was not designed for the end users 
such as biologists; extra training and programming skills 
are usually required in order to successfully install this 
UNIX-style program and generate comprehensive pattern 
sets for large-scale systematical miRNA target analysis. 
Xie et al. (15) developed a BLAST-based target search 
program, miRNAassist. More recently, Fahlgren and 
Carrington (16) described a pipeline for plant miRNA 
target prediction using the FASTA program and Perl 
scripts for matching and scoring. These programs 
require local installation on a standalone computer and 
are not designed for high-throughput computing; there- 
fore are not suitable for genome-scale analysis. In 
addition, the performance of BLAST-like programs 
are controversial; for example, our study indicated that 
NCBI BLASTN may miss up to 70% potential targets, 
because these programs were designed to align long 
sequences, such as the Expressed Sequence Tags (ESTs) 
and genomic sequences (8). 

Zhang (19) introduced an online analysis tool, miRU, 
for plant miRNA target analysis. The miRU adopted 
Smith-Waterman algorithm to search the optimal align- 
ment; it also provides a simple user-friendly web interface 
and outputs an easily understandable list of matching 
results, which makes miRU a popular plant miRNA 
target analysis tool (17,18). However, miRU only accepts 
one sequence at a time for analysis (lacks high-throughput 
analysis capability) and the user can only search target 
candidates in the preloaded libraries. These shortcomings 
have limited its application in genomic studies, such as 
analyzing the popular next-generation sequences (19). 

The above-mentioned plant miRNA target analysis 
tools generally focus on the complementarity between 
small RNA and target transcript. The accessibility of 
target site on mRNA to a small RNA, determined by 
secondary structure of mRNA around the target site, 
has been proved to be an important factor in target rec- 
ognition (20-23). Incorporating such target-site accessibil- 
ity evaluation to small RNA target analysis was reported 
to significantly improve the prediction accuracy (6). 

Brodersen et al. (11) reported that miRNA translation 
inhibition might widely spread in plants. However to date, 
there is no reported plant small RNA target prediction 
tool that is capable of distinguishing the newly discovered 
mechanism from the well-accepted post-transcriptional 
inhibitions. 

Here, we present a new plant small RNA target analysis 
web server, psRNATarget (http://plantgrn.noble.org/ 



psRNATarget/). The psRNATarget integrates two im- 
portant analysis functions: (i) reverse complementary 
matching between small RNA and target transcript 
using a proven scoring schema, and (ii) target-site acces- 
sibility evaluation by calculating unpaired energy (UPE) 
required to 'open' secondary structure around a small 
RNA target site on the mRNA. This server incorporates 
recent discoveries in plant small RNA target recognition, 
e.g. it distinguishes translational inhibition and post- 
transcriptional inhibition (11), and it reports the number 
of small RNA/target site pairs that are reportedly 
associated with small RNA recognition activity to the tar- 
get transcript. The psRNATarget is designed for high- 
throughput analysis of next-generation data by imple- 
menting a distributed computing pipeline which runs on 
a Linux cluster at back-end. The server front-end inte- 
grates three user-friendly interfaces to accept user- 
submitted or preloaded small RNAs and transcript se- 
quences and outputs a comprehensive list of small 
RNAs and matching target sites on candidate transcripts 
along with built-in online tools for batch downloading, 
key word searching and the results filtering. 

PRINCIPLES AND psRNATarget BACK-END 
PIPELINE IMPLEMENTATION 

Complementarity 

psRNATarget evaluates complementarity between small 
RNA and target gene transcript using the scoring schema 
originally applied by miRU (19). Instead of using the 
NCBI BLAST program, we employed a popular Smith- 
Waterman (24) implementation, ssearch (Version 36.x) 
(25), in our back-end pipeline since the latter warrants 
finding the most alignments between very short small 
RNA sequences and the mRNA sequences. 

Multiplicity of target sites 

Another good reason for adopting the ssearch program is 
that it has capability of returning multiple alignments 
(ssearch versions 36.x and latter releases) for each small 
RNA/target transcript pair, unlike most of other Smith- 
Waterman implementations that only return the optimal 
alignment. Returning multiple optimal alignments enable 
the reporting of multiple target sites for each small RNA/ 
target transcript pair. This so-called multiplicity of target 
sites is especially relevant to the biogenesis of siRNA 
because existence of dual target sites of a miRNA on a 
specific target transcript has been reported to be an effect- 
ive trigger of ta-siRNA biogenesis from a TAS 
(transacting-siRNA) precursor gene (21,26). 

Target site accessibility 

The psRNATarget server analyzes target accessibility 
using the RNAup program in Vienna Package (27). 
RNAup calculates UPE which is the energy required to 
'open' secondary structure around target site on mRNA. 
Less energy represents higher possibility to be an effective 
target site because the secondary structures may prevent 
small RNA and target site from contacting. Considering 
the bulk of RNA-induced silencing complex (RISC), both 
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5'- and 3'-flanks of target site are included for target 
accessibility evaluation (22). 

Translational inhibition 

In plants, it has been observed that mismatches occurred 
around the center of miRNA/mRNA complementary 
region tend to disable the cleavage activity of RISC; 
however, the binding of mRNA to RISC still can block 
gene expression at the translational level (11). The 
psRNATarget server reports translational inhibition 
potential when a mismatch is detected in the central com- 
plementary region of the small RNA sequence. 

Parallel computing analysis pipeline 

Both the Smith-Waterman-based alignment using the 
search program and the UPE calculation using the 
RNAup program are very computational intensive. 
Although these programs produce much more accurate 
results, they significantly impact the analysis throughput. 

We greatly enhanced the analysis throughput by de- 
veloping an efficient back-end pipeline on the basis of an 
in-house developed distributed computing platform, 
namely BioGrid. Upon user submission, the master node 
of BioGrid system divides the submitted miRNA datasets 
into multiple subsets and transfers these subsets as well as 
specified target transcript library to remote computing 
nodes in a Linux Cluster. Next, the master node remotely 
calls and monitors analytic programs in these computing 
nodes, and finally the master node collects outputs when 
these analysis jobs are completed. The communications 
between master node and computing nodes are mainly 
through the Linux SSH (Secure Shell) channel. The 
psRNATarget back-end system, including the BioGrid 
platform, was written in Java and Groovy languages. 

WEB INTERFACES OF psRNATarget 
Input 

The server front-end integrates three simplified user- 
friendly interfaces to accept user-submitted sequences 
and selection of a preloaded miRNAs and transcript se- 
quences for analysis, i.e. (i) searching user-submitted small 
RNAs against preloaded transcripts; (ii) searching pre- 
loaded small RNAs (miRNAs by species) against user- 
submitted transcripts; and (hi) searching user-submitted 
small RNAs against the user-submitted transcripts. 

In each input interface, default parameters are sug- 
gested and preloaded based on our literature analysis; 



however, users may adjust the behavior of back-end pipe- 
line by changing the parameters. Briefly, maximum ex- 
pectation for complementarity and UPE (maximum 
energy to unpair the target site) for target accessibility 
analysis may be decreased to retrieve more stringent pre- 
diction results; more potential target sequences might be 
missed, though (Figure 1, arrows A and C). The current 
default values of both parameters were suggested based on 
our benchmark test (see 'Performance' section). The 
hspsize is the length of scoring region for complementary 
analysis; users are advised to reduce it to the shortest 
length of small RNAs if the submitted small RNAs are 
shorter than default hspsize value (20 nt). Otherwise, those 
small RNAs shorter than hspsize will be skipped in target 
analysis (Figure 1, arrow B). The two flanking sizes 
(lengths of the left and right flanking sequences of the 
target site) are also adjustable in the target accessibility 
analysis (Figure 1, arrow D). The users may also adjust 
the range of central region in which any detected mismatch 
will be considered as a trigger of translational inhibition 
(Figure 1, arrow E). 

Output 

After each successful submission, the end user will be 
provided a unique URL to trace the analysis progress or 
check final results at any time. Once the submitted analysis 
is completed, the psRNATarget server lists the details of 
the potential small RNA/target site pairs page by page 
with a comprehensive query and sort tools on the top of 
each output page for user to easily browse through the 
results (Figure 2). In addition, psRNATarget allows 
users to download the entire results in a tab-delimited 
text file, which is very critical for large-scale data analysis. 

PERFORMANCE 

The psRNATarget searches target gene based on both 
complementarity scoring analysis and secondary structure 
analysis. We demonstrate its performance by predicting 
target genes of 10 published Arabidopsis thaliana 
miRNAs (Supplementary Data 1) and comparing those 
predicted target genes to the experimentally validated 
targets reported in literature. With the default parameters 
(Expectation < 3, a slightly relaxed threshold), psRNATa 
rget found 92 target candidates (Supplementary Data 2) in 
TAIR9 cDNA library for the 10 miRNAs, which includes 
all of 46 validated target gene (100% coverage rate) at 
50.0% potential false positive prediction rate. With a 
more stringent cut-off threshold (Expectation < 2), 



A Maximum expectation: 

B ™^ Length for complementarity - scoring (hspsize): 

C Target accessibility' - maximum energy to unpair the target site (UPE): 

D Flanking length around target site for target accessibility analysis 

E Range of central mismatch leading to translational inhibition: 



40 



20 



(range: 0-5, less is better) V 
(range: 15-30bp) 0 
25.0 (range: 0-100, less is better) © 

bp in upstream / 13 bp in downstream 



17 



11 



nt 



Figure 1. A set of parameters for adjusting the behavior of back-end pipeline of psRNATarget server. 
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e.g. AT1G27360, miR156, transcription factor .. 



Batch Download 

miRNA 

Target Acc. 



Expectation 



Target 
Accessibility 
ITJPE1 



Expectation: 3.0 UPE: 25.0 Search 

Range: 0.0 - 3.0 Range: 0.0 - 25.0 

List of Predicted miRNA/Target Pairs [#Session ID: 13019313321223971 



| Sort [ by: 



| miRNA Acc. |^| 
Expectation(E)0 



Alignment 



arh- 

miR156a 



AT1G27360.1 1.0 



miRNA 20 CACGAGUGAGAGAAGACAGU 1 
Target 1253 GDGCUCDCUCUCUUCUGUCA 1272 



Piev Page Next Page Page No. 1 / Total 3 Pages . 73 Records 



Target Description 



Inhibition Multiplicity 



Symbols: | squamosa promoter-binding protein-like 11 (SPL1 1) | chrl:9501971- r] 
9503869 FORWARD [PFAM] 688-918 PF03 1 10.7 SBP domain; t-leavage 



AT1G27360.2 1.0 



miRNA 20 CACGAGUGAGAGAAGACAGU 1 
Target 1213 GUGCUCUCUCUCUUCUGUCA 1232 



Symbols: | squamosa promoter-binding protein-like 11 (SPL1 1) | chrl:9501077- 
9503869 FORWARD [PFAM] 648-878 PF03110.7 SBP domain; 



Figure 2. A list of comprehensive miRNA/target site pairs along with query and sort tools on top of each output page. 



psRNATarget detected 52 target candidates. Of them, 38 
genes have been reportedly validated by 5'-RACE technol- 
ogy, which covers 82.6% of validated target genes at 
26.9% potential false positive prediction rate. These 
results indicate that psRNATarget is able to systematical- 
ly identify target transcripts; and users may trade their 
preference on higher prediction coverage or lower false 
positive prediction rate using different thresholds. 

One of the popular applications for psRNATarget is 
to search target genes in transcript library for small 
RNAs sequenced by the next-generation technology 
(28,29). In a performance test, the published small RNA 
dataset from Arabidopsis Small RNA Project (http://asrp 
.cgrb.oregonstate.edu/), which consists of around 206000 
small RNAs, was submitted to search against the 
Arabidopsis TAIR9 transcript database (http://www 
.arabidopsis.org/). The psRNATarget took 1 h 54 min 
to complete the whole analytic procedure running on a 
Linux cluster equipped with 264 cores (52 AMD opteron 
processors) and generated around 2.5 million small RNA/ 
target site pairs using default parameter values except that 
the expectation cutoff value was set <4 to generate large 
number of matching pairs. This benchmark results 
indicate that psRNATarget is well capable of performing 
high-throughput analysis for large-scale datasets, such as 
the next-generation sequencing (NGS) data. 



DISCUSSION 

Complementarity and target-site accessibility have been 
proven to be the two key factors in the plant regulatory 
small RNA (miRNA and ta-siRNA) target recognition 
mechanism (19-21). Both factors have been incorporated 
to improve the analysis of miRNA target genes in 
Arabidopsis (6); however, there is no published tool that 
is able to evaluate these factors for plant regulatory small 
RNA target analysis for general purpose. The psRNATa 
rget server successfully integrates two well proven appr- 
oaches (19,27) to evaluate the above two factors; both 
approaches have been widely applied and well validated 
by experiments, which warrant the analytic quality of the 
psRNATarget server. 
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